Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache the collection schema, only read from vatstore once per crank #7333

Merged
merged 6 commits into from
Apr 11, 2023

Conversation

warner
Copy link
Member

@warner warner commented Apr 5, 2023

This changes the collectionManager to keep the per-collection
metadata (keyShape, valueShape, label) in a cache. Like the VOM's
dataCache, this schemaCache is populated at most once per crank, and
flushed at the end of every crank. This removes the GC-sensitive
syscalls that used to fetch the metadata when userspace caused a
collection to be reanimated. Instead, the cache is read when userspace
invokes a collection method that needs the data (most of them, to
validate keys or values), regardless of whether the collection
Representative remains in RAM or not.

It also removes the now-unused in-RAM list of all collections, which used to support deleteAllVirtualCollections, called from stopVat.

closes #6360
refs #5058
closes #7142

@warner warner added SwingSet package: SwingSet liveslots requires vat-upgrade to deploy changes labels Apr 5, 2023
@warner warner requested a review from FUDCo April 5, 2023 06:41
@warner warner self-assigned this Apr 5, 2023
@warner
Copy link
Member Author

warner commented Apr 5, 2023

@FUDCo the last commit changes the serialized |schemata value from [keyShape, valueShape] to { keyShape, valueShape } (from array to object/record). I thought that make future extensions more straightforward, but I could be talked out of it, especially since it grows the stored size slightly (it's storing an explicit undefined for both, but maybe we should change that, by building up the serialized object one property at a time). I'm moderately sure that there's no functional difference between e.g. { keyShape, valueShape: undefined } and just { keyShape }.

So two questions for you to think about:

  • avoid storing values of undefined in the record?
  • record vs array?

A performance note: this approach does either zero or one reads of |schemata and |label per collection per crank, triggered by pretty much any collection method (since nearly all of them want to validate against keyShape). The schemata it reads is flushed at end-of-crank. This makes the syscall trace deterministic, but I'd respect an argument that once-per-crank is too frequent. The next most logical thing to do would be to hold them across multiple cranks, until a BOYD flushes the cache. That frequency felt too slow (i.e. too high RAM usage) for the VOM dataCache, but the virtual-collection schemaCache holds a lot less data, so we might choose a different tradeoff.

@mhofman this ought to remove the gc-sensitivity due to collections being collected and then reanimated. And #7138 removed another. I think there's one known source of sensitivity for me to try to fix, involving KindHandles being reanimated (the other part of #7142).

@warner warner marked this pull request as draft April 5, 2023 07:17
@warner
Copy link
Member Author

warner commented Apr 5, 2023

oops, something is still broken, will re-submit when ready

@warner warner force-pushed the 6360-schema-cache branch from 10cc46d to 91f525f Compare April 5, 2023 08:28
@warner
Copy link
Member Author

warner commented Apr 5, 2023

Ok I went ahead and changed the serialized schemata object to be empty if there were no keyShape/valueShape constraints given. In practice, there's always a keyShape, because the default is M.scalar(), but this still saves some space in the serialized form when valueShape = undefined (which is really common)

I'm storing schemataCD (capdata) in the cache. This isn't obviously useful for this PR, but there's another one stacked up on top of it (#7334, for #7321) that needs the .slots to do refcounting on.

@warner warner marked this pull request as ready for review April 5, 2023 08:34
@mhofman
Copy link
Member

mhofman commented Apr 5, 2023

I went ahead and changed the serialized schemata object to be empty if there were no keyShape/valueShape constraints given

Quick question: is there any reason not to merge the schemata and label vatStore entries into a single one? Especially now that we have a record for the schemata.

@warner
Copy link
Member Author

warner commented Apr 5, 2023

I went ahead and changed the serialized schemata object to be empty if there were no keyShape/valueShape constraints given

Quick question: is there any reason not to merge the schemata and label vatStore entries into a single one? Especially now that we have a record for the schemata.

I think that'd be fine. We've got two mutable metadata keys (|entryCount, updated each time we add/remove something from a collection, and |nextOrdinal, incremented when we add a new vref-based key), and two immutable ones (|label and |schemata). We'll want to keep the mutable ones as they are, they have different "temperatures". But merging all the immutables ones makes sense, and then naming them something closer to |info.

@FUDCo can you remember why they were split?

@FUDCo
Copy link
Contributor

FUDCo commented Apr 5, 2023

@FUDCo can you remember why they were split?

I don't think there was a deep reason. Mainly, I think, to avoid having to parse the value to separate two relatively unrelated pieces of information from each other.

@warner
Copy link
Member Author

warner commented Apr 5, 2023

Cool, thanks. Ok, now that we're reading them both at the same time (schemata, to build the keyShape/valueShape checkers, and label, to build the error messages emitted if those checkers fail), it'll save a vatstore read to merge them together. I'll add a commit to do that.

Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK, but there are a few points I'm unclear on that I'd like to be clear on before approving.

packages/swingset-liveslots/src/collectionManager.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/collectionManager.js Outdated Show resolved Hide resolved
Comment on lines 83 to 89
return harden({ keyShape, valueShape, label, schemataCD });
};
/** @type {(collectionID: string, value: SchemaCacheValue) => void } */
const writeBacking = (collectionID, value) => {
const { label, schemataCD } = value;
const schemataKey = prefixc(collectionID, '|schemata');
const schemataValue = JSON.stringify(schemataCD);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return keyShape, valueShape, AND schemataCD from the read but stringify schemataCD in the write? If the consumer needs the shapes, having the schemataCD doesn't help, and when writing the producer (presumably) generates the keyShape and valueShape values and shouldn't concerned with how they're encoded. Am I missing something here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is preparation for #7321 (PR #7334), which needs to know about any slots (vrefs) in the serialized keyShape/valueShape. When the collection is deleted, rather than re-serialize those values (to learn the vrefs, so we can feed them to vrm.removeReachableVref, to decrement their refcounts), we keep the original serialized forms around. The memory overhead of retaining the serialized capdata is pretty small, and it didn't occur to me to trust that serialization would return the same values as the original.. felt more appropriate to retain the original instead.

The only time we write to the schemaCache is when a collection is first created, and at that time we also have the serialized capdata available (again, as anticipation for #7334, which needs to increment refcounts for everything therein).

packages/swingset-liveslots/src/collectionManager.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/liveslots.js Show resolved Hide resolved
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well OK then

@warner warner force-pushed the 6360-schema-cache branch from 082f244 to 83155ef Compare April 10, 2023 20:49
@warner warner force-pushed the 7353-double-free branch 2 times, most recently from 49c6e6b to 8b58027 Compare April 10, 2023 22:57
@warner warner force-pushed the 6360-schema-cache branch from 83155ef to 8a3ccc0 Compare April 10, 2023 22:57
@warner warner force-pushed the 7353-double-free branch from 8b58027 to d5114d2 Compare April 10, 2023 23:17
@warner warner force-pushed the 6360-schema-cache branch from 8a3ccc0 to 09a7d00 Compare April 10, 2023 23:17
@warner warner force-pushed the 7353-double-free branch from d5114d2 to 8316eb9 Compare April 11, 2023 01:19
Base automatically changed from 7353-double-free to master April 11, 2023 01:58
warner added 6 commits April 10, 2023 19:04
This removes `deleteAllVirtualCollections`, and the in-RAM
`allCollectionObjIDs` Set of all collectionIDs which supported
it. Once upon a time, `stopVat` called `deleteAllVirtualCollections`
to delete the vatstore data for all non-durable virtual
collections. We removed responsibility for deleting virtual data from
the vat (and stopped calling `stopVat` altogether) in the interests of
reducing upgrade risk, and improving upgrade performance, despite the
fact that this will cause a DB storage leak during upgrade (we figure
we can find a way to fix that later, and disk is cheap).

Now that `deleteAllVirtualCollections` is gone, we don't need to spend
the RAM on each collection, which violated our high-cardinality design
rules anyways.

refs #5058
This changes the collectionManager to keep the per-collection
metadata (keyShape, valueShape, label) in a cache. Like the VOM's
dataCache, this `schemaCache` is populated at most once per crank, and
flushed at the end of every crank. This removes the GC-sensitive
syscalls that used to fetch the metadata when userspace caused a
collection to be reanimated. Instead, the cache is read when userspace
invokes a collection method that needs the data (most of them, to
validate keys or values), regardless of whether the collection
Representative remains in RAM or not.

It also changes the serialized form of the schemata to be an object
like `{ keyShape, valueShape }` instead of an array like `[keyShape,
valueShape]`. If the constraints are missing, the object is empty,
which is smaller to serialize. I'm also thinking this might make it
more extensible.

closes #6360
Virtual collections need to record a label and a keyShape/valueShape
schema. Previously these were kept in two separate vatstore keys,
`|label` and `|schemata`. With this change, they're both stored in the
`|schemata` key. This should reduce the number of vatstore reads
necessary to work with a collection.
@warner warner force-pushed the 6360-schema-cache branch from 09a7d00 to 2faa0a3 Compare April 11, 2023 02:06
@warner warner added force:integration Force integration tests to run on PR automerge:rebase Automatically rebase updates, then merge labels Apr 11, 2023
@mergify mergify bot merged commit 24de1b6 into master Apr 11, 2023
@mergify mergify bot deleted the 6360-schema-cache branch April 11, 2023 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge force:integration Force integration tests to run on PR liveslots requires vat-upgrade to deploy changes SwingSet package: SwingSet
Projects
None yet
3 participants